Q-Learning with Double Progressive Widening: Application to Robotics

نویسندگان

  • Nataliya Sokolovska
  • Olivier Teytaud
  • Mario Milone
چکیده

Discretization of state and action spaces is a critical issue in Q-Learning. In our contribution, we propose a real-time adaptation of the discretization by the progressive widening technique which has been already used in bandit-based methods. Results are consistently converging to the optimum of the problem, without changing the parametrization for each new problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Q Learning based Reinforcement Learning Approach to Bipedal Walking Control

Reinforcement learning has been active research area not only in machine learning but also in control engineering, operation research and robotics in recent years. It is a model free learning control method that can solve Markov decision problems. Q-learning is an incremental dynamic programming procedure that determines the optimal policy in a step-by-step manner. It is an online procedure for...

متن کامل

Machine Learning for Autonomous Robotic Agents

We present some results of our research in the field of Machine Learning applied to robotics problems. In particular we have investigated on: (i) the application of Learning Classifier Systems to the synthesis of robot controllers; (ii) learning of fuzzy controllers; (iii) learning of purposeful representations of the environment; (iv) and the application of versions of Q-learning to robot trai...

متن کامل

Adding Double Progressive Widening to Upper Confidence Trees to Cope with Uncertainty in Planning Problems

Current state of the art methods in energy policy planning only approximate the problem (Linear Programming on a finite sample of scenarios, Dynamic Programming on an approximation of the problem, etc). Monte-Carlo Tree Search (MCTS [3]) seems to be a potential candidate to converge to an exact solution of these problems ([2]). But how fast, and how do key parameters (double/simple progressive ...

متن کامل

Continuous Upper Confidence Trees

Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is in particular surprisingly efficient in high dimensional problems. It is known that it can be adapted to continuous domains in some cases (in particular continuous action spaces). We here present an extension of Upper Confidence Trees to continuous st...

متن کامل

-Learning: A Robotics Oriented Reinforcement Learning Algorithm

We present a new reinforcement learning system more suitable to be used in robotics than existing ones. Existing reinforcement learning algorithms are not speci cally tailored for robotics and so they do not take advantage of the robotic perception characteristics as well as of the expected complexity of task that robots are likely to face. In a robot, the information about the environment come...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011